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DNA ENCODING PNEUMOCYSTIS CARINH PROTEASE 

This invention relates to a novei Pneumocystis carinii 
protease and to nucleic acids encoding it. The invention also relates to 

5 vectors containing the nucleic acids, to celis transformed with the vectors 
and to antibodies specific for the protease, in addition, the invention 
describes uses of aii of the above. 

The fungal pathogen Pneumocystis carinii causes potentially 
fatal pneumonia in the immunocompromised, including those receiving 

1 0 immunosuppressive therapy for organ transplantation, those with 

advanced malignancy and in particular those with HiV infection. The Sack 
of an effective in vitro culture system stiil remains a major obstacle in the 
understanding of the biology of P.carinii and its interactions with its host. 
Molecular techniques have been employed in the study of the organism, 

15 and a number of genes have now been cloned. Among these is the multi- 
gene family encoding the major surface glycoprotein, {MSG or gpA) of the 
parasite. 

The P. carinii major surface glycoprotein is highly 
mannosylated and is antigenically distinct in organisms isolated from 

20 different mammalian host species (Lundgren et al. , 1991; Gigliotti, 1 992). 
The MSG multi-gene family has been identified in the genome of P.carinii 
sp. f. carinii {rat-derived P.carinii) Kovacs et al., 1993; Wada et al., 1993; 
Sunkin et al., 1994), P.carinii sp. f. mustelae (ferret-derived P.carinii) 
(Haidaris ef al., 1992; Wright et al., 1995), P.carinii sp. f. hominis (human- 

25 derived P.carinii) (Stringer et al., 1993) Garbe & Stringer, 1994) and 

P.carinii sp. f. muris (mouse-derived P.carinir) (Wright et al., 1994). The 
different copies of P.carinii sp. f. carinii MSG genes are of similar size but 
heterogeneous in sequence. They have been found on multiple 
chromosomes and often organised in tandem arrays. The majority of MSG 

30 genes are located in the subteiomeric regions of the P.carinii sp. f. carinii 
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chromosomes (Underwood et a/., 1996; Sunkin & Stringer, 1996). The 
expression of MSG genes has been shown to be mediated by the 
upstream conserved sequence (UCS) which is found on a single 
chromosome situated in the subteiomersc region. Different copies of MSG 
5 have been shown to be linked to the UCS. it has been postuiated that this 
differentia! expression of MSG may occur in a strategy to evade the 
immune response of the host by antigenic variation (Wada et a/., 1995; 
Sunkin & Stringer, 1996). 

Presently there are two standard treatments for 
10 Pneumocystis pneumonia, namely pentamidine or cotrimoxazoie. These 
drugs were originally used because it was thought that Pneumocystis was 
a protozoan; only recently has genetic sequence analysis placed it in the 
fungal kingdom. Despite its classification as a fungus, Pneumocystis does 
not respond to the usual anti-fungai drugs and hence the drug regimes 
15 have remained all but unchanged. These regimes are particularly 

unpleasant with many patients reacting adversely, thus requiring a switch 
in treatment. Thus AIDS patients in particular would benefit from the 
development of new anti-Pneumocystis therapies since a high proportion 
of AIDS patients suffer adverse side effects, and many have multiple 
20 episodes of P. carinii pneumonia due to their decreasing CD4+ lymphocyte 
count and persistence of immune suppression. 

Recently, a novel family genes from P. carinii sp. f. carinii has 
been described (Lugli and Wakefield 1996). The genes are found in the 
subteSomeric regions of the P. carinii sp. f. carinii genome, and show 
25 homology to protease genes from a number of fungi. 

Wada and Nakumura (1994) describes the discovery of an 
open reading frame (designated ORF-3) encoding a protein of unknown 
function in P.carinii sp. f. carinii and located close to the MSG genes. The 
sequence given (DD B J/EM BL/GenBank accession no. D31909 and 
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D17441) corresponds to a portion of the genes discussed above (Lugli and 
Wakefield 1996). 

It has now been discovered that there is a P.carinii sp, f. 
hominis counterpart to the family of genes in the rat-derived P. carinii 
5 species referred to above, the human-derived P. carinii species having at 
least 50% difference to the rat-derived P. carinii species in its nucleotide 
sequence. The novel mufti-gene family is known as PRT1 (Protease 1); 
the genes show high levels of homology with the subtiSisin-like serine 
proteases. 

10 The subtilisin-iike serine proteases are a group of 

endoproteases which have been characterised from a wide variety of 
organisms including bacteria, fungi and higher eukaryotes. They have 
been found to function in the specific endoproteolytic processing of pro- 
proteins at cleavage sites of paired basic amino acid residues, to generate 

15 regulatory proteins in a mature and biologically active form. The pro- 
hormone processing enzyme kexin, encoded by the KEX2 gene of 
Saccharomyces cerevisiae has been characterised and found to cleave the 
precursors of the a mating factor and the killer toxin (Fuller ef a/., 1989). 
Genes encoding a similar processing endoprotease have been identified in 

20 a number of other fungi, the KEX1 gene from the yeast K/uyveroroyces 
iactis (Tanguy-Rougeau ef a/., 1988), the gene encoding the KEX2-re?ated 
protease (krp) from Schizosaccharomyces pombe (Davey et a/., 1994) and 
the XPR6 gene from Yarrowia lipolytica (Enderhn & Ogrydziak, 1994). 
Mammalian homologues have also been identified inciuidng the human fur 

25 gene (fes upstream region) in the region upstream of the fes proto- 

oncogene, encoding the enzyme furin (van den Ouweland et a/., 1990). 
The genes Dfurl and Dfur2 from the insect Drosophiia melanogaster 
encoding furin-like proteins (Roebroek et a!., 1992) and the bli-4 gene from 
the nematode Caenorhabditis eiegans have also been studied. Other 

30 members of the subtilisin-like serine protease family have been identified 
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and the specific endoproteolytic activity of some of them has been 
elucidated. However for many others, the precise biological function has 
riot yet been determined. 

The PRT1 gene product may be a specific endoproteoiytic 

5 processing enzyme, such as is seen in other subtiiisin-like serine 

proteases. Given that in genetic organisation some copies of PRT1 are 
generally found in the subtelomeric region, just downstream from the MSG 
gene, the PRT1 protein encoded by these genes may be involved in the 
processing of MSG to its mature form. The multicopy nature of the PRT1 

1 0 gene may reflect the need for processing of enzymes of different specificity 
for the different types of MSG. Whatever its precise role, the activity of the 
PRT1 protein is undoubtedly essential to the viability and therefore the 
pathogenesis of P.cahnii. 

Recently, there has been considerable interest in targeting 

1 5 proteases, for the control of a number of different diseases and in 

particular HIV infection. Combination therapies for HiV treatment employ 
protease inhibitors; a large variety of protease inhibitors are therefore 
available for testing against new proteases. 
The Invention 

20 Part of the catalytic domain of a PRT1 gene has been cloned, 

sequenced and characterised from three types of the host specific fungal 
pathogen P.cahnii, namely P.cahnii sp. f. rattus (rat variant), P.cahnii sp. f. 
niuhs (mouse) and P.cahnii sp. f. hominis (human). The newly discovered 
human-infecting P.cahnii PRT1 catalytic domain sequence is shown in 

25 figure 1 and nucleotide sequence alignments for rat P. cahnii, rat variant 
P. cahnii, mouse P. cahnii and human-infecting P.cahnii PRT1 clones are 
shown in figure 2. These m\i enable the sequencing of the remaining parts 
of a PRT1, using techniques known to those skilled in the art of molecular 
biology. 
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The invention therefore provides in one aspect an isolated 
DNA comprising part or all of a PRT1 gene of a non-rat infecting species of 
Pneumocystis carinii. 

The invention also provides an isolated DNA comprising a 
5 sequence shown in figure 1 , or a non-rat P. carinii sequence shown in 
figure 2, or a sequence which hybridises to either of these under stringent 
conditions. 

In further aspects, the invention provides recombinant vectors 
containing PRT1 DNA sequences as described herein, and recombinant 
10 polypeptides which are part or all of a PRT1 gene product, encoded by the 
vectors. 

In another aspect, the invention provides synthetic peptides 
corresponding to antigenic portions of a PRT1 gene product. 

In further aspects, the invention provides a method of 
15 producing antibodies specifically immunoreactive with a P. carinii protease, 
which method comprises using a recombinant polypeptide or a synthetic 
peptide as described herein to generate an immune response; and 
antibodies produced by the method. 

In another aspect, the invention provides a method of 
20 screening for an\\-Pneumocystis carinii compounds, which method 

comprises providing a source of a recombinant polypeptide expressed by 
part or all of a PRT1 gene or cDNA, and contacting the compound with the 
recombinant polypeptide. 

in another aspect, the invention provides an engineered ceil 
25 transfected with a recombinant vector containing PRT1 DNA sequences as 
described herein. 

in another aspect, the invention provides an engineered ceil 
line expressing a recombinant polypeptide from part or ail of a PRT1 gene 
or cDNA, useful in a method of screening for anti-P.canM compounds such 
30 as protease inhibitors effective against P.carinii. 
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In another aspect, the invention provides a P.carinii protease 
isolated using an antibody specifically immunoreactive with a P.carinii 
protease, as described herein. 

In another aspect, the invention provides PRT1 clones for 
5 part or all of a human-infecting P.carinii PRT1 gene from the PRT1 multi- 
gene family. 

A part of the PRT1 gene as referred to herein may be for 
example a fragment of the gene which codes for a specific domain such as 
the catalytic domain, or it may be a shorter sequence such as a sequence 
10 not less than 15 nucleotides in length or not less than, 20 nucleotides in 
length. Sequences of about 15 or about 20 nucleotides In length are 
generally the shortest practical length of oligonucleotide useful as a 
sequence specific primer or probe. That is, these are generally the 
shortest lengths of sequence that will hybridise specifically to a gene 
15 sequence under stringent conditions. 

Within the PRT1 multi-gene family will be related genes which 
will be easily identifiable as such by those skilled in the art, but which may 
nevertheless differ in location, function and sequence. It will be evident 
that all members of the PRT1 multi-gene family, which members may 
20 variously be described as different genes in the family or as different 
copies of the PRT1 gene, are included within the scope of the invention. 

Known methods to mutate or modify nucleic acid sequences 
can be used in conjunction with this invention to generate useful PRT1 
mutant sequences. Such methods include but are not limited to point 
25 mutations, site directed mutagenesis, deletion mutations, insertion 

mutations, mutations obtainable from homologous recombination, and 
mutations obtainable from chemical or radiation treatment. 

Furthermore, recombinant DNA techniques are available to 
mutate the DNA sequences described herein, to link these DNA 
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sequences to expression vectors and express the PRT1 protein or part of 

the protein eg. the catalytic domain or the P-domain. 
in the attached figures: 

Figure 1 shows the genomic DNA sequence of part or the catalytic domain 
5 of PRT1 from P.carinii sp. f, hominis. (SEQ SD NO: 22) 

Figure 2 shows DNA sequence alignments for part of the catalytic domain 

of PRT1 from P.carinii. (Found in GenBank AF001305, GenBank 

AF001304, and SEQ ID NOS: 23 - 29, in the order in which they appear). 

Figure 3 shows amino acid sequence alignments of past of the catalytic 
1 0 domain of PRT1, translated from the nucleotide sequences in figure 2. 

(Found in GenBank and SEQ ID NOS: as for Figure 2). 

Figure 4 shows alignment of P.carinii PRT1 derived amino acid sequences 

from P.carinii sp. f. carinii clones. (Found in GenBank AF0013G5, 

GenBank AF001304 and SEQ ID NOS: 30, 31,33- 47, 32, 48 - 50). 
1 5 Figure 5 shows DNA sequence alignments for P.carinii sp.f. carinii PRT1 

clones. (Found in GenBank AF001 305, GenBank AF001 304 and SEQ ID 

NOS: 30 - 32) 

Figure 6 shows a schematic representation of the P.carinii sp. f. carinii 
PRT1 gene. 

20 Figure 7 shows expressed recombinant PRT1 fragments. 

By analogy to P.carinii sp. f. carinii there are expected to be 

many copies of the PRT1 gene within the P.carinii sp. f. hominis genome. 

Some of these copies may be significantly different and form a number of 

different sub-types. They will a!!, however, be classed as members of the 
25 PRT1 multi-gene family by virtue of homology at some domains of the 

gene, for example the catalytic domain. 

Seven different domains have been identified to date in the 

P.carinii sp. f. carinii PRT1 amino acid sequence, namely: 

i) N •terminal hydrophobic domain 

30 ii) Pro-domain 
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iii) Catalytic domain 

iv) P-domain 

v) Proline-ricb domain 

vi) Serine-threonine rich domain 

5 vis) C-terminal hydrophobic domain 

The P.carinii sp, f. hominis homoiogues may have fewer, the 
same number or more domains. Although some domains in some 
members of P.carinii sp, f. hominis PRT1 gene famiiy may be absent or 
some extra domains may be present, these genes will still be considered to 
10 be members of the PRT1 multi-gene famiiy. 

The proteins encoded by different copies of this gene family 
may have a variety of different functions, including: 

i) as a constituent of the outer cell surface of the parasite, and 
attached to the ceil membrane by a glycosyl- 

1 5 phosphatidyiinositol (GPI) anchor 

ii) the proteolytic processing within a P. carinii sub-celiular 
organelle of the P.carinii major surface glycoprotein (MSG) 
to its mature form, possibiy at a conserved dibasic amino acid 
site in the upstream conserved sequence of MSG 

20 iii) in the interaction of the parasite with its host, forming a 

specific iigand on the parasite ceil surface which binds to a 
host receptor molecule 

There may be other functions of the members of this gene 
family which have not yet been recognised. These may include functioning 
25 as a protease on as yet unidentified pro-proteins, or as a structural 
glycoprotein at some life-cycle stage of the parasite. 

it has been demonstrated that the protease is a surface 

protease. 

Therapeutic intervention 
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The PRT1 protein presents a target for a variety of different 
therapeutic interventions, which may include: 

i) inhibitors of protease ac tivity 

it is postulated that the proteolytic activity of PRT1 is 
essentia] for the viability of the parasite. The predicted 
structure of the catalytic domain of the PRT1 protein 
suggests that there are subtie differences compared to other 
such proteases so far studied. These differences may be 
exploited in the design of specific drugs, with less toxic side- 
effects than seen in the present available treatments. 

ii) Va ccine s 

Available data indicates that some copies of PRT1 may 
comprise a major surface antigen and therefore provide a 
potential target for vaccine development. 

iii) immunotherapy 

Passive immunisation with antibodies to PRT1 may be 
protective. 

iv) Analogues 

Analogues designed to imitate PRT1 may be active in 
blocking the adherence of P.carinii organisms to a receptor 
on the human cells. 



Identification of_asubtiSis m-iike serine protease in P.carinii sp.J^ 
25 carinii 

METHODS 

P.carinii DNA extraction 

P.carinii infection was induced m Sprague Dawley rats by 
steroid immunosuppression. The organisms were isolated and purified 
30 from infected rat lung tissue by the method described by Peters et a/.. 



WO 98/39424 



PCT/GB98/OO704 



10 

(1992). Genomic P.carinii DNA was extracted by digestion with proteinase 
K (1 mg/ml) in the presence of 0.5% SDS and 1DmM EDTA, pH8,Q, at 
50°C for 16h. followed by phenol.chloroform extraction and ethano! 
precipitation. P.carinii DNA for use in PFGE experiments was prepared in 
5 SeaPlaque GTG agarose as described by Banerji et a!., (1993). 

For oligonucleotide primers, see Table 1 and Lugli et at 1997. 
Isolation of copies of the PRT1 gene from P.carinii sp. f. carina 
genomic and cDNA libraries 

A copy of the PRT1 gene was isolated from an unamplified 
10 genomic library from P.carinii sp. f. carinii constructed in XEMBL3 (Banerji 
et a!.., 1993). The library was screened with a cDNA clone containing a 
region of a P.carinii sp. f. carinii MSG gene (GenBank Accession number 
GBPLN:PMCANTIA, donated by Dr C J Delves and Dr F Voipe). A 
relatively high number of recombinant plaques gave positive hybridization 
1 5 signals compared to the positive recombinant plaques when the library was 
screened with a probe derived from the single copy arom locus {Banerji et 
al., 1993). Five recombinant phages were isolated from the tertiary screen 
and the DNA was subcfoned into the plasmid vector pBluescriptl I . 

in order to isolate a full cDNA clone, a P. carinii sp. f. carinii 
20 cDNA library constructed in XZAPII (donated by Dr CJ Delves and Dr F 
Voipe, see Dyer et al., 1992), was screened with PCR products derived 
from amplification of the 5' end of the gene with oligonucleotide primer pair 
pcprot9 and prp4r (9/4r product), and of the 3' end of the gene with 
pcprot13/RI and pcprot12/R! (13/12 product). The primary screening was 
25 carried out using both probes, and the secondary and tertiary screens were 
carried out using only the 9/4r product. The number of positive clones 
when screening the cDNA library with the two probes appeared to be 
relatively high when compared to the number obtained using a single copy 
gene. Four recombinant phage isolated from the cDNA library were 
30 partially characterized. The recombinant DNA was recovered from the X 
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phage by in vivo excision as pBlueScript plasmid DNA. The size of the 
recombinant DNA ranged from 2.7kb to 2,9kb, and sequence analysis 
revealed that ail four clones contained a poiyA tail. One recombinant, 73j 
was selected for further analysis and the recombinant DNA was sequenced 
5 in full from both strands. 
DNA amplification 

Oligonucleotide primers were designed to various regions of the 
P.carinii PRTI nucleotide sequences. Some oligonucleotides had an 
EcoRi restriction endonuciease site incorporated at the 5 ! end to facilitate 
10 cloning of the amplification products into EcoRi-digested plasmid vectors 
pBluescript SK(-) (Stratagene) or pUC18 (Pharmacia). The final 
concentration of the amplification reaction mix was 50mM KC1, 1GroM Tris 
(pH8.0), 0.1% Triton X-100. 3mM MgCl 2 , 400fxM (each) deoxynucleoside 
triphosphate, 1uM oligonucleotide primer and 0.025 U Taq polymerase 
1 5 ml" 1 (Promega, UK). With primer pair pcprotO and pcprotl 0, forty cycles of 
amplification was performed at 94°C for 1.5 min., 53°C for 1 .5 min., and 
72°C for 2.0 min. With primer pair pcprot9 and pcprot4r the same 
conditions were used, except an annealing temperature of 50°C was used. 
With all other primer pairs, ten cycles of amplification were carried out at 
20 94°C for 1.5 min., 55°C for 1.5 min., and 72°C for 2.0 min, followed by 30 
cycles of 94 C C for 1.5 min., 63°C for 1.5 min.. and 72°C for 2.0 min. 
Negative controls were included in each experiment. 

The entire putative gene was amplified as three overlapping 
fragments, Prp5e (1626 bp), M14 (1279 bp) and Prp2g (251 bp). 
25 Oligonucleotide primer pairs pcprot9 with pcprotl 0, followed by pcprot6/R! 
with pcprot4/Ri were used in a nested PCR to amplify the 5' fragment, 
designated PrpSe, of length 1626 base pairs (bp). The second portion, 
called M14, spanning 1279 bp of the central region of PRTI, was amplified 
using a nested PCR with primer pairs pcprot2/RI with pcprot!4/RI, followed 
30 by pcprot7/RI with peprot12/Rf. The third fragment, Prp2g, encompassing 
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the 3' end of the sequence {251 bp), was amplified using oligonucleotides 
primers pcprot13/Ri and pcprot14/RI (Table 1 and Lugli etal 1997). 

Five different overlapping regions of the PRTi gene were 
also amplified, cloned and the DNA sequences were determined. The first 

5 region amplified with primer pair pcprot1/RI and pcprot3/Ri spanned 

approximately half of the subtilisin-iike catalytic domain, the second region 
amplified with primer pair pcprot2/Ri and pcprot4/Rl spanned the end of 
the subtilisin-iike catalytic domain and the start of the P domain, the third 
region amplified with primer pair pcprof.'7/R! and pcprot8/R! spanned the 
10 P-domain, the fourth region amplified with primer pair 36ex/R3 and P13/R! 
spanned the proline-rich domain and the fifth region amplified with primer 
pair pcprot13/RI and pcprot 14/RS spanned the C-terminai hydrophobic 
domain. The sequences Prpla, Prp3a, Prp7a, Prp2c, Prp3c, Prp4c, 
Prptaf2, Prpf4, Prp5f, Prpg3 and Prp5g were amplified from the 

15 P. carinii cDNA library, and sequences Pcr-19, Pcr-14, Pcr-5, Pcr-3, Pcr-1 , 
Lam-1 and Prpg4 from the P.carinii genomic DNA (Figure 4). 
DNA sequence analysis 

DNA sequence analysis was performed using the dideoxy chain 
termination method. Sequence data was obtained in full from both strands 

20 for ail sequences. Analysis of the sequence data was carried out using the 
University of Wisconsin Genetics Computing Group (UWGCG) Sequence 
Analysis Software Package, Version 8, 1994, Genetics Computer Group, 
Madison, Wisconsin. 
Pulsed Field Gel Electrophoresis 

25 P. carinii sp. f. carinii organisms were isolated from an 

infected rat Sung and the chromosomes were separated by pulsed field gel 
electrophoresis (PFGE), using a Contour Clamped Homogeneous Electric 
Field (CHEF) DRII apparatus {Bio-Rad, UK) operated at 4°C. 
Electrophoretic separation was achieved using 0.9% Seakem agarose gel 

30 with initial switching time of 10 sec increasing to a final switching time of 80 
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sec at 1 80 V for 48 hours. A karyotype corresponding to P.carinii sp. f. 
car/'n// form 1 was observed (Cushion et aL, 1993). 
Southern hybridisation 

Southern blotting and hybridization were carried out using 

5 standard techniques (Sambrook et a/., 1989). PFGE blots were hybridised 
with three probes derived from different domains of the PRT1 gene. The 
product 9/4r was derived from amplification of the 5' end of the PRT1 gene 
with primer pair pcprot9 and pcprot4r/RS, product 2/4 from amplification of 
the central catalytic region with primer pair pcprof2/RI and pcprot4/RI, and 

1 0 product 1 3/1 2 from amplification of the 3' end of the gene with primer pair 
pcprot13/RI and pcprot12/R!. The amplification products were gel-purified 
(GeneCleanll, BIO101) and labelled with [a- i2 P]-dCTP by random priming 
(Megaprime, Amersham). Hybridisation was carried out at 45°C and 
stringency washing at 60°C in 0.2xSSC and 0.1% SDS. 

1 5 Southern blots of genomic P.carinii DNA digested with 

restriction endonuclease Psfi or BamHi were probed with oligonucleotide 
probes pcprot3/RI, pcprot5/RI, pctel2, and msgterm, labelled with [y- 32 P]- 
dATP using polynucleotide kinase. Hybridisation was carried out at 46°C 
and stringency washing at 52°C in 5xSSC and 0.5% SDS. 

20 

RESULTS 

Analysis of DNA and deduced amino acid sequence of copies of the 
PRT1 gene 

We have identified a family of genes in the P.carinii sp. f. 

25 carinii genome which shows homology to the subtiiisin-like serine 

proteases. We have named this gene family PRT1 (protease 1). A copy 
of the PRT1 gene (Paga) was isolated from a P.carinii genomic library, the 
open reading frame (3069bp) containing seven short putative intervening 
sequences. A copy of the PRT1 gene (73j) was also isolated from a cDNA 

30 library, of length 2370bp. Portions of the gene were amplified by PCR from 
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the cDNA library as three overlapping fragments, at the 5' end (Prp5e}, the 
centra! region (M14) and the 3' end (Prp2g). Five other regions of the gene 
were aiso amplified, from either the P.carinii cDMA or genomic libraries. 

Analysis of the DNA sequence of the copy of the PRT1 gene 

5 from the genomic library, PRT1(Pa<ja), and of the copy from the cDNA 
library, PRT1 (73j), confirmed the presence of seven short introns in the 
genomic DNA sequence. The introns ranged in length from 38 bp to 45 
bp, with a base composition ranging from 71% to 84% A+T. Sn all seven 
introns, the dinudeotide GT was present at the 5' splice donor site and AG 

1 0 at the 3' splice acceptor site. The sequence YTRAT, which has been 
identified as the putative lariat forming motif in other P.carinii sp. f. carinii 
introns (Zhang & Stringer, 1993), was present in the first, second, fourth, 
fifth and seventh intron. The eukaryotic lariat consensus sequence, 
YYRAY, was identified in the third and sixth intron. 

1 5 The sequence of the cDNA clone, PRT1(73}), contained an 

open reading frame of 2370bp, which on translation resulted in a peptide of 
790 amino acids {Figure 4). The deduced amino acid sequence was 
compared to sequences in the GenBank and EMBL databases and 
showed homology to fungal and other eukaryotic subtilisin-iike serine 

20 proteases. The A+T content of the ORF was 64%, with a high A+T content 
at the third base position of the codons. The base composition of the 5' 
upstream sequence was 74% A+T, and the 3' downstream sequence was 
75% A+T. A consensus polyadenylation signal, AATAAA, was observed 
88bp downstream of the stop codon. 

25 The deduced amino acid sequence of the genomic clone 

PR77{Paga), the cDNA clone PRT1(73\), the three fragments obtained by 
PGR amplification of the cDNA library and the other recombinant clones 
generated by DNA amplification were compared (Figure 4). Several 
regions of homology were found and also a number of regions in which 
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significant divergence was observed. These data suggested that the 
sequences were derived from different copies of the PRT1 gene. 
Comparison with other subtiiisin-like serine proteases 

The deduced amino acid sequence of the cDNA cione 

5 PRT1 (73j) was aligned with nine other subtiiisin-iike serine proteases 

including fungal mammalian, insect and nematode sequences. The PRT1 
sequences showed homology with all the other sequences, with a high 
ieve! of homology in the subtiiisin-like catalytic domain. The three essential 
residues of the catalytic active site, asparfic acid (Asp 2M ), histldine (His ?52 ) 

10 and serine (Ser 423 ) were conserved in all the PRTi sequences. The 

highest levels of homology between a!! the sequences were around these 
residues. 

The structural organisation of the fungal sequences showed 
domains characteristic of this class of processing endoproteases, a 

1 5 hydrophobic signal sequence, a pro domain that may be cleaved by 
autoproteolysis, a subtiiisin-like catalytic domain, a P-domain which is 
known as such because it is essentia! for proteolytic activity, a 
serine/threontne-rich domain which may potentially be modified by 0-linked 
giycosylation, a carboxy-terminai hydrophobic trans-membrane domain 

20 and a C-terminal tail with acidic residues (Van de Ven ef a/., 1993) The 
P.carini} PKn sequences showed a putative similar structural organisation 
but unlike the nine other subtilisin-tike serine proteases, they also had a 
proline-rich domain proceeding the serine-threonine rich domain and the C- 
terminal hydrophobic domain (Figure 6). The P.carinii PRT1 (73]) sequence 

25 had a hydrophobic signal sequence at the N-terminus, followed by a 

putative pro-domain, a subtiiisin-like catalytic domain from Ser„, to His 474 , a 
P-domain from residue Tyr 475 to Ser e3 „ a proiSne-rich domain from residue 
Pro^, to Pro 707 , a serine-threonine rich domain from residues Thr 7Q8 to 
Ser 785 , and a carboxy-terminai hydrophobic domain from residues His m to 

30 Phe 790 . 
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Analysis of subiMisin-like catalytic domain 

The three-dimensional structures of four subtilisin-like serine 
proteases have been determined, subtilisin BPNVNovo from Bacillus 
amyloliquefaciens (Hirono era/., 1984; Bott et a/., 1988), subtiiisin 

5 Carfsberg from B. licheniformis (McPhaien & James, 1988), thermitase 
from Thermoactinomyces vulgaris (Gros et a/., 1989; Tepiyakov et a/., 
1990) and proteinase K from Titirachium album (Betzei et a/., 1988). The 
amino acid sequence of these four proteases has been compared to that of 
31 other subtilisin -like serine proteases isolated from bacteria, fungi and 

10 higher eukaryotes and the essentia! core structure of the catalytic domain 
of this group of molecules has been identified (Siezen et a/., 1991). 

We have compared the deduced amino acid sequence of the 
P.carinii PRT1(?ty) gene with the multiple sequence alignment of the other 
subtilisin-like serine proteases and have identified the three essential 

1 5 residues of the catalytic active site aspartic acid, histidine and serine in the 
PRT1 sequence (Asp** His aS2 and Ser< 23 ). On the basis of the sequence 
alignment, the P.carinii PRT1 sequence could be assigned to the class 1 
subtiiases, within the subgroup l-E which contained the pro-hormone 
processing proteases from yeasts and higher eukaryotes (Siezen et a/., 

20 1991). 

Eight a-he!icai domains and nine B-sheet regions have been 
defined as the structurally conserved regions within the essential core 
structure. The variable regions which connect the core segments have 
been found to differ both in length and in amino acid sequence (Siezen et 

25 a/. , 1 99 1 ). High levels of homology were observed between the PRT1 
sequences and the other sequences in the regions of the two conserved 
internal helices, helix C (residues 252 to 262) and helix F (residues 422 to 
438). Eleven amino acid residues have previously been found to be totally 
conserved in all the characterized subtilisin-like serine proteases, and most 

30 but not all are conserved in the PRT1 sequences. These amino acid 
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residues are at the active site, Asp 214 , His^ and Ser 423 , [found in all the 
PRT1 sequences except PRT1(Prp7a)] and in the interna! heiices at 
residues G3y 253 , Gly^, Pro 427 . The residues Ser 310 , Giy 312 , GSy 35j[ Giy 42 , and 
Thr 422) involved in substrate binding, were conserved in ail the PRT1 

5 sequences, except Thr 4?3 which was found only in two sequences 
generated by PCR, PRTl(Prpia) and PRT1(Prp7a). 

In addition to the totally conserved residues, seven other 
amino acid residues have been identified which are highly conserved, of 
these six were conserved in the P.cahnii PRT1 sequences and included 

10 the oxyanion hole residue (Asn^), residues near the active site, GJy 218 , 
Thr 2S4> and also residues Giy J0S , Gly 2 „ and Gly^ Seven conserved 
cysteine residues were found in ail the P.carinii PRT1 sequences, Cys 2S6 , 
Cys 268 , Cysjog, Cys^, Cys^, Cys M , and Cys 4 , 5 . Nineteen variable regions, 
generally located in loops on the surface of the molecule, have been 

15 identified in the subtilase family, of which 14 were found in the P.carinii 
PRT1 sequences. Three positions have been identified at which charge is 
totally conserved in all the subtilisin-Jike proteases examined, and these 
were also conserved in the P.carin//PRT1 sequences, the positive charge 
on Arg^ and the negative charges on residue Asp 2 , 4 (active site) and 

20 Aspzj,, 

it has been proposed that the high specificity of the class i-E 
subtilisin-like serine proteases for paired basic residues Lys-Arg or Arg- Arg 
may be facilitated by a high density of negative charge at the substrate- 
binding face, provided by nine highly conserved Asp residues and one Giu 
25 residue (Siezen et a/., 1991). Two of the Asp residues, Asp 353 and Asp 4M 
were found in all the P.carinii PRT1 sequences and also the Glu 293 , In 
addition, four other Asp residues were found in some but not all of the 
copies of PRT1. 

30 
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Analysis of the domains flanking the sabtilisin-like catalytic domain 

The putative domains of the PRT1(73j) polypeptide are 
summarised in Figure 6. A hydrophobicity plot, of the PRT1(73j) sequence 

5 revealed a hydrophobic region at the N-terminus suggesting that this may 
be a signal sequence. Residues 1 to 23 of the N- terminus of the sequence 
showed a high level of homology to the N-terminus of the P.carinii sp.f. 
carinii multifunctional folic acid synthesis fas gene which encodes 
dshydroneopterin aldolase, hydroxyroethyldihydropterin pyrophosphokinase 

10 and dihydropteroate synthase (Voipe et a/., 1992, 1993). This region was 
followed by the presumptive pro-domain, which may be cleaved by 
autocataiysis. Potential autocatalytic sites of paired basic residues were 
identified in the PRTI(Paga) and PRT1(Prp5e) sequences at Lys 115 - Arg 116 
and Arg 138 - Arg 137 , but were absent in the PRT1(73j) sequence. Five other 

1 5 semi-conserved autocatafytic sites were found in some copies, but not aii, 
of the P.carinii PRT1 sequences, two in the catalytic domain (Lys 400 - 
Arg 401 , Arg 473 - Arg 474 ), three in the P-domain (Arg S21 - Arg 522 , Arg 55S or Lys^ 
- Arg 555 , Arg 57s - Arg^). One potential autocayaiytic site at the start of the 
carboxy-termina! hydrophobic region (Lys^ - Arg^), which was found in 

20 all the sequences. The PRT1 (73j) sequence contained two of the potential 
autocataiytic sites, Arg 57s - Arg S77 and Lys 789 - Arg 770 . 

The PRT1 sequences showed homology with the other 
subtilisin iike serine proteases in the region of the P-domain, the highest 
homology being with the derived amino acid sequence of the S. pombe krp • 

25 gene. Four potential sites for N-linked glycosylate were observed in ail 
the PRT1 sequences, three in the subtilisin-iike catalytic domain (Asn 1S4 , 
Asn 277 , Asn^}, and one in the P-domain (Asn^). 

A sehne-fhreonine rich region was also identified in the 
PRT1 (73j) sequence from residue Thr 708 to Ser 755> and the hydrophobicity 

30 plot of the PRT1 (73j) sequence revealed a hydrophobic region at the C- 
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terminal end, residues His^, to Phe 79i) , suggesting a membrane-associated 
domain. Unlike most other serine protease sequences, however, all the 
copies of the PRT1 polypeptide contained a proline- rich region 
downstream of the P-domain. 
5 Genetic organization of the PRT1 multi-gene family 

Analysis of the alignments of the DNA and the deduced 
amino acid sequences of copies of the PRT1 gene from genomic DNA, 
the cDNA sequence and the three fragments obtained by PGR of the 
cDNA library revealed domains in the PRT1 gene which were highly 
1 0 conserved and also regions where significant divergence was observed , 
again suggesting that PRT1 comprises a multi-gene family {Figure 4). The 
subtilisin-like catalytic domain and the P-domain appeared to be conserved 
whereas high levels of heterogeneity were observed in the praline-rich 
domain and the C-termina! domain. The variation in this region was both in 
15 length and in sequence. A number of repeated DNA sequence motifs were 
found in the proline-rich region. Nucleotide sequences encoding 
poiyproline were found in ail the sequences, and also the dipeptides Pro- 
Glu and Pro-Gin and the tetrapeptides Pro-Giu-Pro-Gin and Pro-Giu-Thr- 
Gln. The order and number of tandem repeats varied in each sequence. 
20 The overall length of this region varied from approximately 67 amino acid 
residues in the shortest sequence, PRT1(73j), to 233 residues in the 
longest sequence, PRT1(M14). 

in order to further substantiate the presence within the 
P. cahnii genome of multiple copies of the PRT1 gene, P.carinii sp. f. carinll 
25 chromosomes, separated by pulsed field gel electrophoresis, were 

analysed by hybridisation with three probes derived from different domains 
of PRT1 , All three probes showed similar patterns of hybridization, 
anealing at high stringency to ail the chromosome bands except for one, 
the third smallest in size, approximatey 350Kbp. This provided further 
30 evidence that the P.carinii sp. f. carinii genome contained many copies of 
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the PRT1 gene, which were present on most of the P.carinii sp. f. carinii 
chromosomes. 

The sequences of Die PRT1 gene family showed high levels 
of homology with ORF3, which has been demonstrated to be contiguous 

5 with a copy of the gene encoding the major surface glycoprotein MSG 100 
(Wada & Nakamura, 1994). This gene arrangement was reported in 15 
other X clones, in which a gene showing high homology to ORF3 was 
located downstream of a copy of MSG (Wada & Nakamura, 1994). Most 
copies of the MSG genes have been demonstrated to be located in the 

10 P.carinii sp. f. carinii subtelomeric regions (Underwood ei a/., 1998; Sunkin 
& Stringer, 1996). The copy of the PRT1 gene encoded by the 
PRTI(Paga) sequence was cloned from a X EMBL3 genomic library as a 
single 14kb fragment and was approximately 1150bp downstream of a 
copy of MSG. Four other X clones isolated from the same library contained 

15 a copy of PRT1 contiguous with a copy of MSG. 

P.carinii sp. f. carinii genomic DNA was digested with either 
restriction endonuclease Pstl or SamHI and probed sequentially with four 
oligonucleotide probes, derived from the 5' end of PRT1 gene (pcprotS/RI), 
from the catalytic domain of the gene (pcprot3/RI), an MSG probe 

20 (msgterm) and a subtelomeric probe (Pctei2). Ail probes hybridised to 

multiple bands. The hybridisation pattern of some of the bands, ranging in 
size from 7kb to greater than 12kb, were the same for all four probes. 
However, hybridisation to other fragments was not coincident, with the 
PRT1 probes alone hybridising to some high molecular weight fragments 

25 and also low molecular weight fragments of less than 7kb. 

DISCUSSION 

We describe the cloning and characterisation of copies of the 
PRT1 multi-gene family from P.carinii sp. I carinii. A copy of the PRT1 
30 gene was isolated from a F-'.carinii sp. f. mrinii genomic library. A different 
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copy was isolated from a cDNA library, indicating that this copy of the gene 
was transcribed, and aiso identifying the presence of seven short introns in 
the genomic sequence Consistent with many other P.carinii genes, the 
coding region and the flanking sequences of the PRT1 sequences showed 
5 a strong bias for adenine or thymine, and in particular at the third base 
position of the codons. Similarly, the presence of short A+T rich introns 
has been reported in other P.carinii genes, in the PRT1 sequences, the 
introns were not distributed throughout the gene, but six of the seven 
introns were found in the subtiHsm-Kke catalytic domain, and the seventh in 
1 0 the P-domain. The introns may play a role in restricting the variation in this 
region of the gene, whereas no introns were observed in the highly 
heterogeneous proline-rich region (Rogers, 1985). 

The high level of homology of the P.carinii PRT1 sequences 
to the subtilisin-iike serine proteases, and in particular in the region of the 
1 5 catalytic domain, strongly suggested that this gene encoded a protease of 
this type. The predicted P.carinii PRT1 polypeptide sequences possessed 
the three essential residues of the catalytic active site as well as many 
other highly conserved motifs The domain organisation of the PRT1 gene 
strongly resembled that of the fungal prohormone processing proteases, 
20 with the exception of the proline-rich domain. This proline-rich region is 
very uncommon in the subtilisin-iike serine protease superfamiiy, although 
the KRP6 gene from Y. lipoiyiica is reported to contain a short region of a 
tetrapeptide repeat, the consensus sequence of the four amino acids being 
Glu (Asp/Glu) Lys Pro (Enderlin and Ogrydziak, 1994). A proline-rich 
25 region has aiso been found in the carboxy-terminai tail domain of the 

mammalian serine protease acrosin, a proteolytic enzyme of sperm cells, 
located in the acrosome at the apical end of the spermatozoan (Klemm et 
a/., 1991). 

In the African trypanosome, Trypanosoma brucei. a proline- 
30 rich domain has been identified in the procyclic acidic repetitive proteins 
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(PARPs). These proteins are fouiKS on the ceii surface of the insect form 
of the parasite and are encoded by a family of polymorphic genes which 
contain a variable region with heterogeneity both m length and sequence. 
The variable region contains the proline-rich domain and Is primarily 

5 composed of the dipeptide Giu-Pro (Roditi et a!., 1 988). 

Unlike any of the other fungal prohormone processing 
proteases, which appear to be single copy genes, the data reported in this 
study suggest that the PRT1 sequence is present in many copies, which 
are similar but not identical, in the genome of P.carinii sp. f. carinii. The 
1 0 relatively large number of recombinants present in both the genomic and 
the cDNA libraries suggested a multi-copy gene and this was substantiated 
by PFGE data, revealing that at least one copy of a PRT1 gene was 
present on all but one of the P.carinii chromosomes. Southern 
hybridisation of restriction endonucleolytic digests of P.carinii sp. f. carinii 

1 5 DNA probed with PRT1 sequences also confirmed the presence of many 
copies of the gene. Analysis of sequence data generated by the 
amplification of the locus showed heterogeneity, suggesting that a variety 
of different copies of the gene were present in the P.carinii genome. Some 
domains, including the subtilisin-like catalytic domain and the P-domain, 

2i highly conserved between gene copies, whereas the highest levels of 

divergence were observed in the proline-rich domain, which varied both in 
length and in sequence. 

Of five genomic clones analyzed in this study, all possessed 
a copy of PRT1 contiguous with a MSG gene, it has been reported that 1 5 

25 independent genomic clones which encoded MSG were contiguous with 
the ORF3 sequence, which from our analysis, appears to encode the 
proline-rich domain of PRT1 (Wada & Nakamura, 1994). It has been 
demonstrated that most copies of MSG are subteiomeric (Underwood et 
a/., 1996, Sunkin & Stringer, 1996). It is therefore highly likely that many 

30 copies of the PRT1 multi-gene family are located in the subteiomeric 
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regions of the P.carinii sp, f. carinii genome. However PFGE analysis has 
shown that not every P.carinii sp. f. carinii chromosome contained a copy 
of PRT1, and the preliminary characterisation of a clone of one of the 
subtelomeric regions of P.carinii sp. f. carinii has not revealed a copy of 
5 PRT1 (Underwood & Wakefield, unpublished results). Hybridisation of 
MSG and subtelomeric probes to endonuclease digested P.carinii sp. f. 
carinii DNA resulted in positive hybridisation to fragments greater than 
approximately 7 kb in size. Probes derived from the PRT1 sequence 
hybridised to these bands but also to low molecular weight fragments, 
10 again suggesting that not all copies of PRT1 are subtelomeric. 

The P.carinii PRT1 gene family shows some striking 
similarities to that of MSG. Both are composed of many genes, copses of 
which are found on most P.carinii chromosomes and show sequence 
heterogeneity. Some copies of PRT1 are contiguous with MSG and are 
1 5 located in the subtelomeric regions of the P.carinii chromosomes. 

it is interesting to note that one of the major components of the ceil 
surface of Leishmania has proteolytic activity. The Leishmania major 
surface protease (msp or gp&3), a zinc endoprotease, is found in all 
species of Leishmania and is encoded by a family of genes, some of which 
20 are tandemly arrayed (Bouvier et a/., 1989; Webb et a/., 1991). Expression 
of different copies of the gene is regulated during the development of the 
parasite and different isoforms of the protein are found in the promastigote 
stage in the gut of the sand fly and in the amastigote stage in the 
phagolysosomes of the macrophages {Frommel et a/., 1990; Roberts e! 
25 a/., 1995; Rarnamoorthy et a/., 1995). The major surface protease is 
thought to play an important rote in the virulence of Leishmania by 
involvement in the degradation of components of the extracellular matrix 
and by facilitating promastigote attachment to host macrophages 
(McMaster et a/., 1994). Immunisation with MSP protein confers partial 
30 protection of mice against Leishmania infection (Abdelhak et a/., 1 995). 
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The proteins encoded by the P.carinii PRT1 gene family show 
highest homology to the subtilisin-Sike serine proteases. A wide diversity of 
different types of precursor proteins are processed by this family of 
proteases to mature and active regulatory proteins, but the precise function 

5 of many of these proteases has not yet been determined. Some of the 
fungal homologues have been shown to function in the processing of 
several proteins, such as the S. cerevisiae KEX2 gene product which 
processes both the pheromone a -factor and the killer toxin (Fuller et a/., 
1 989). The krp gene product from S.pombe, which cleaves the pheromone 

1 0 precursor pro-P-factor to its active form, is thought to also function in the 
processing of other regulatory proteins, since its activity is essential for cell 
viability {Davey et si, 1994). The XPR6 gene product from Y. hpolytica, 
although not essential for cell viability, when disrupted was found to cause 
aberrant growth and morphology (Endertsn and Ogrydziak, 1994). The 

1 5 function of the products of the P.carinii PRT1 gene family is not yet 
understood but it is likely to play an important role in the life cycle and 
possibly also the pathogenicity of the organism. 

Identification^ gene from P.carinii &&J 

20 hominis 

PGR strategies using degenerate primers designed using 
P.caiinii sp. f. cahnii PRT1 sequence information failed to isolate any 
P.carinii sp. f. hominis PRT1 clones. The strategies employed included 
single round PCR and nested PGR, on post mortem samples from infected 
25 patients. 

Given the failure of these approaches, it was decided to try to 
obtain additional sequence data from P. cahnii derived from other 

organisms. 



30 
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MATERIALS AND METHODS 
Samples 

Samples of Pneumocystis carinii sp. f. hominis were derived 
from H!V positive patients by ftbreoptic bronchoscopy, an aliquot of this 
5 bronchoscope alveolar lavage (BAL) sample being immediately frozen, 
stored at -20°G and transported to the Institute of Molecular Medicine for 
DMA extraction (samples D503B and 0122B). One sample (C180) was 
derived from a post mortem lung from an HiV-negative patient; the 
parasites were first enriched by successive filtration through 70 urn, 12 
10 and 8pm filters. 

Samples of Pneumocystis from the infected lungs of four 
other mammalian hosts were used. These were Pneumocystis carina sp. f. 
muris {mouse derived). Pneumocystis carinii sp. f. musteiae (ferret 
derived), Pneumocystis carinii sp. f. suis (pig derived), Pneumocystis carinii 
1 5 sp. f. carinii (rat-derived) and Pneumocystis carinii sp. f. rattus (rat derived). 
These were enriched for parasites prior to DNA extraction. 
DNA Extraction 

DNA was extracted from an enriched parasite preparation by 
proteinase K digestion, followed by phenol-chloroform extraction. The 
20 DNA was purified and concentrated using a DNA binding resin (Promega 
Wizard DNA Clean-UP System). 
DNA Amplification 

In general the following conditions were used in all PGR 
reactions. The final concentration of the reaction mix was 50mM KCI, 
25 1 0mM Tris (pH 8.0), 0.1% Triton X-1 00, 3mM MgCI* 400jiM of each 
deoxynudeoside triphosphate, 1jiM of each oligonucleotide primer and 
0.025U of Tag polymerase (Promega) per ml. A total of forty cycles was 
used with 1 0 cycles at 94°C for 1 .5 min (denaturation), annealing at a 
temperature between 48°C and 55°C dependant on primer Tm and 
30 required stringency of reaction for 1 .5min and 72°G for 2min (extension), 
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followed by 30 cycles at 94°C for 1.5min, 63 C C for 1.5min and 72°C for 
2min (the increased temperature at annealing now including the £coR1 
site at the 5' end of the primers). Where there was no £coR1 site in the 
primer or where particularly low stringency was required all 40 cycles were 

5 carried out at the lower annealing temperature. A positive control of rat 
Pneumocystis DNA (rat 1458 or rat 1 189) was included in each PGR 
reaction. Negative controls of no added template DNA were included after 
each sample to monitor for cross contamination. In later PCR reactions, 
when degenerate primers were being used, a negative control of human 

10 DNA (Sigma), at a final concentration of O.BngVi, was included to monitor 
for non-specific amplification of human DNA, which was unavoidably co- 
extracted with all human Pneumocystis DNA samples. The primers used 
are shown in Table 1 herein (and Table 1 of Luglt et a! 1997).. 

Ail PCR products were eiectrophoretically separated out on 

1 5 1.2% or 1 .5% agarose gels containing ethidium bromide, visualised under 
ultraviolet light. 

Determination of the complete sequence of a copy of P.carinu sp. f. 
hominis PRT1 gene 
20 A number of different approaches are available for the 

isolation of the complete gene sequence of a P.cahnii sp. f. hominis PRT1 
gene. Some of the possible approaches are described below in detail. 

DNA and RNA is prepared from P.carinu sp. f. hominis 
organisms, obtained from either bronchoalveolar lavage samples from 
25 P.ca finis infected patients or from post-mortem lung samples, 
i) P. carinii sp. f. hominis ggnomjcjjhran f 

A P. carinii sp. f. carinii genomic library is constructed in aPIX 
and this is screened with the cloned fragment of PRT1. 
Positive recombinant phage are analysed by further rounds of 
30 screening, and full length clones selected for analysis. The 
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arrangement of introns within the gene sequence Is 
determined. The genomic organisation of copies of PRT1 is 
elucidated, and in particular the relationship with gene copies 
of MSG. The chromosomal organisation of different PRT1 

5 copies is examined, including the analysis of copies which 

are in the subtelomeric regions and others which are at an 
interna) location, 
ii) Expressed copies of PRT1 

Two different approaches can be used to examine 

1 0 transcribed copies of PRT1, In the first, Random 

Amplification of cDNA Ends (RACE) is used to extend 5'- and 
3'- of the cloned fragment of PRT1, using total RNA or poly 
A* RNA from the enriched parasite preparation. Primers are 
designed to the sequence of the cloned fragment for use in 

15 this technique. The second approach is the construction of a 

cDNA library in XZAP from P.carinii sp. f. hominis, which is 
then screened with the cloned fragment. Different 
recombinant clones are compared for variation in sequence 
and used for expression studies. 

20 Expression 

i) Expression of cloned fragment of P.carinii sp. f. hominis 
PRT1 (H13) 

The known portion of the catalytic domain is subcloned into 
the pET32a expression vector and expressed in an E. coli 
25 expression system. Recombinant protein is purified and used 

to raise polyclonal antiserum in rabbits. In addition, synthetic 
peptides designed to the PRT1 derived amino acid sequence 
are used in the production of antibodies. 

ii) Expression of the complete gene sequence and fragments of 
30 the gene spanning different domains. 
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Recombinant protein is expressed and purified from different 
domains and from the complete sequence, for use in the 
production of antibodies, and in biochemical and 
immunohistochemicai studies. 
5 Biochemical studies 

Biochemical studies are performed to determine the substrate 
specificity of the protease and the optimum conditions (e.g. pH, metal 
cofactors) for proteolytic activity. This provides an in vitro system for the 
testing of inhibitors to the PRT1 protease. Crystallisation of the 
1 0 recombinant protein is carried out and the 3-D structure of the protein 

determined by X-ray crystallography and compared with the 3D structure of 
the four other subtilisin-like serine proteases whose structure has 
previously been determined. These structural data can used for purposes 
including the design of specific inhibitors of PRT1, and the prediction of 
1 5 antigenicaily important epitopes. 

Immunohistochemistry 

Antibodies raised to the recombinant PRT1 protein or to 
synthetic peptides can be used in the analysis of the subcellular 
20 localisation of PRT1 in P.carinii organisms, using both light microscopy and 
electron microscopy with immunogoW. 
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Table 1 

Oligonucleotide p 

Primer 

Pcprot1d/R1 

Pcprot16d/RI 

Pcproi17d/Rl 

Pcprot18d/RI 

Pcprot24d/RI 

Pcprot25d/Ri 

Pcprot26d/RS 

Pcprot31/RI 

Pcprot32/RI 

Pcprot33/RS 

Pcprot39/RI 

73jEx4/RI 

73jEx5/RI 

PcprotH34/RI 

PcprogH35/RI 



Sequence 

GGGAATTCTA T C T A C G NTGV A C C NTGGGGNCC 

GGGAATTCCA C fGgiACi c A GiTG T c GCiGG 

GGGAATTCA c r G A Td T c G T CCAiGTiA G A c ' A T ! c iGG 

GGGAATTCTAiGC G A TciAi T c TT»CC A G A TA iCC 

GGGAATTC G A CC A C GAATA T C GTAGAAGC 

GGGAAT1"CGTTTT T C GG° A A T C G A T C GAGG A T GG 

GGGAATTC A T GCAA T G AGGTV C A G GAAGCAGA 

GGGAATTCGAAGATGTTGATATTGAGGAG 

GGGAATTCATCGTCTCTTATCGCACCC 

GGGAATTCTCAACTCAACTAATACC 

GGGAATrCAGGAATGATTTTTGTGGGCT 

GGGAATTCTTATGGAACAGCTGTTTCC 

GGGAATTCATCAATAGACTCTCCG 

GGGAATTCTTGCGAATATTATCCGGGC 

GGGAATTCGCACTrCCACCTGCATATG 



20 



Oligonucleotide Sequences. Note that ! = inosine and N = any base in degenerate sequences. 
The oligonucleotides above have SEQ ID NOS: 1-15. according to the order in which they appear in 
the above table. 
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Single round PGR on Rat Variant, Mouse, F erret and Pig d erived P.carinii 
Single round PGR on P.carinii sp. f. mUus and P.carinii sp.f. 
maris samples gave strong amplification products at the same Mr as the rat 
P.carinii positive control. Primers used were Pcprot1/R1 and Pcprot3/R1 . 
5 Sequence data is shown in Figure 2. 

Single Round PG R on Human Post Mortem Sample usin g.R edesigned 
Primer 

New primers were designed based on regions of homology of 
the newiy obtained rat variant P. carinii and mouse P. carinii PRT1 

10 sequences with the rat prototype P. carinii sequence at both the DMA level 
and amino acid ievei. These were not fully degenerate, given that 
Pneumocystis DMA shows a high AT bias (60-70%); unless the sequence 
data suggested otherwise only A or T was used at potentially degenerate 
sites (as seen in the amino acid sequences). These new primers were 

1 5 used in reactions with one another and previously used primers. Of these 
reactions, only Pcprot16d/R1 and Pcprot28d/R1 gave a clear positive 
product at the expected Mr, close to that of the rat P. carinii positive control 
(-600 b.p.). The primers used were Pcprot25d/R1 + Pcprot26d/R1 ; 
Pcprot1d/R1 + Pcprot26d/R1; Pcprot16d/R1 + Pcprot26d/R1 ; 

20 Pcprot25d/R1 + Pcprot17d/R1; Pcprot25d/R1 + Pcprotl 8d/R1 ; 

Pcprot25d/R1 + Pcprot24d/R1 . The PGR products from the reactions were 
cloned and sequenced. Of the clones sequenced one contained an insert 
which showed homology to the PRT1 gene. Sequence data over the 
catalytic domain is shown in Figures 2 and 3. 
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13 


12 
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49-53 


Mouse P. carina 


14 


; s 
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7 


27-28 


43-46 


Human P. carinii 


24 


18 


18 


20 


42 


; 67 



Table showing percentage divergence of prototype rat-derived 
Pneumocystis {P. carinii sp. f. carinii}. mt LSU rRNA - mitochondrial large 

5 subunit rRNA; mt SSU rRNA - mitochondria! small subunit rRNA. Values 
for Variant rat P. carinii from two clones; values for Mouse P. carinii from 
three clones. DNA divergence calculated with Jukes-Cantor correction 
method. Protein divergence calculated using Kimura protein distance. 

The above table shows that the PRT1 gene differs between 

1 0 P. carinii from d ifferent host organisms by far more than many other genes 
so far studied. Thus in P.carinii sp. f. hominis the PRT1 DNA sequence is 
around twice as divergent from P.carinii sp. f. carinii compared to other 
sequences and the amino acid sequence is over three times as divergent 
as the arom sequence. This is even more striking given that the PRT1 

1 5 data are taken from the catalytic domain which should contain the highest 
level of conservation (catalytic, substrate binding, oxyanion hole and 
disuiphide bridge residues). A similar level of divergence has previously 
been observed in the MSG (also called Glycoprotein A; gpA) genes, 
indeed, early attempts to amplify some portions of gpA/MSG from P.carinii 

20 sp. f. hominis by PGR using primers based on the P.carinii sp. f. carinii 
sequence failed (Kovacs et aL, 1993; Wright etai., 1994). 

A high level of divergence is also seen in the PRT1 
sequences from P.carinii sp. f. ratius and P.carinii sp. f. rnuris where the 
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PRT 1 DMA sequences are two to four times as divergent as the other 
sequences and the mouse P. carinii PRT1 amino acid sequence is over six 
times more divergent than that of awm. 

The homology of the amino acid sequences from all three 
5 types of Pneumocystis to the subtilisin-like serine proteases is high. Of the 
known conserved residues, most can be seen to be conserved in the PRT1 
sequences {where the data are available). Certainly in the P. carinii sp. f. 
hominis PRT1 amino acid sequence there is greater conservation of the 
negatively charged amino acids at the substrate-binding face than is seen 
1 0 in the P.carinii sp. f. carinii sequence. Although the homology to the 
subleases is unmistakable, there is considerable variation to be seen 
between the PRT1 sequences. This presumably reflects differences in 
substrate specificity, whether the substrate is a host protein (or proteins) or 
a parasite protein (e.g. gpA/MSG), 
1 5 The function of the subtilisin-like serine proteases so far 

studied is in the specific endoproteolyUc processing of precursor proteins to 
their active form. Although the precise function of many subtilases is yet to 
be determined, some fungal homoiogues have been shown to be vital to 
cell viability or norma! function. Thus krp in S. pombe has been shown to 
20 be vital to ceil viability and disruption of XPR6 in V. lipotytica causes 

aberrant growth and morphology. Parallels may also be drawn between 
Gp63 in Leishmania and PRT1 in Pneumocystis, as discussed in the 
introduction. The functions of the PRT1 proteins are not yet fuily 
established, but it seems likely to be important to the life-cycle and/or the 
25 pathogenesis of the organism. The cloning of this gene, most especially 
from P.carinii sp.f. hominis, is thus a step towards the design of an 
effective anti-PneumoeysSs dfug. 
Generation of anti-PRT1 antibodies 

Polyclonal antiserum was generated in rabbits to synthetic 
30 peptides, designed to the Pneumocystis carinii sp. f. carinii PRT1 



WO 98/39424 



PCT/GB98/00704 



33 

sequence. Regions of the protein which were likely to be immunogenic 
were predicted using the appropriate software, and peptides (15 mers) to 
six different regions were synthesized. A mixture of six synthetic peptides 
was administered by subcutaneous injection to rabbits (New Zealand 

5 white). An antibody response was elicited by standard procedures, using 
Freunds complete adjuvant for the first injection and Freunds incomplete 
adjuvant for subsequent injections. 

The resulting poiyclonai antisera were tested against the 
peptides. The greatest cross-reactivity of the antisera was found with 

1 0 Peptide 7, designed to a region of the catalytic domain (amino acid 
residues 424 - 438 of the PRT1(73j) sequence) and with Peptide 9, 
designed to the pro-domain (amino acid residues 64 - 78 of the PRT1 (73j) 
sequence). 

15 Peptide sequences 
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EXAMPLES 

Example 1 

25 

Expression of portions of the rat-derived P. carinii (P. carinii sp, f. 
carinii) PRT1(73j) gene. 

The E. coli expression vector pET32a (Novagen, Madison, 
Wi) was used. This vector contains an inducible T7iac promotor, a 6-His 
30 tag, a multiple cloning site and the recombinant protein is expressed as 
fusion protein with the Trx-tag thioredoxin protein (109 amino acids). 
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Recombinant thioredoxin fusion proteins are generally more soluble and 
remain in the £ coli cytoplasmic fraction. Three different regions of the 
PRT1(73j) gene were cloned into pET32a: i) Cat2f1, a portion of the 
catalytic domain, 585bp in length, from base 790 to base 1375; i!) Flalj, a 
5 portion of the pro-domain, 255bp in length, from base 120 to base 375; iii) 
G1b1c, a portion of the P domain, 384 bp in length, from base 1515 to 
base 1899. 

The specific fragments were amplified by PGR from the 
PRT1(73j) sequence as follows - i) Cat2f1 using primers Pcprot39/R1 and 

10 73] Ex4; ii) F1a1j using primers Pcprot31/Ri and Pcprot32/Ri; iii) G1b1c 
using primers Pcprot33/RI and 73j£x5/RI (see Table 1). All primers 
included an EcoRI site the 5' end to facilitate cloning. The fragments were 
initially cloned into the piasmid vector pUC, linearized with EcoRI and 
treated with alkaline phosphatase, to produce a stable, high copy number, 

15 recombinant piasmid. The recombinant DNA was then subcloned into the 
EcoRI site of the expression vector pET32a. 

2. Transformation of E. coli with recombinant plasmids 

E. coli DH5a competent ceils were transformed with the 
20 recombinant plasmids. The cells were transformed with recombinant pUC 
plasmids, and also recombinant pET32a plasmids. The recombinant 
expression vector pET32a constructs were also transferred into E. coli DE3 
(BL21) cells, for expression of the recombinant peptides. 

25 3. Expression of recombinant PRT1 polypeptides 

The recombinant pET32a constructs, transformed into E. coli 
DE3(BL21) were induced with IPTG, and the bacteria were grown for 3 to 4 
hours. The cells were collected by centrifugation and disrupted by 
sonication. The bacterial proteins were separated by SDS-PAGE and 

30 electrophoretically transferred to nitrocellulose filter. The immobilised 
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proteins were cross-reacted with anti-thioredoxin antibody (Sigma), and the 
bound antibody was visualised with a swine anti-rabbit immunoglobulins 
secondary antibody, conjugated to alkaline phosphatase. A band of the 
expected size (24kDa) was seen in the control vector pE732a, (lane 1 ) 
5 corresponding to the thioredoxin fusion protein and the His-tag. Bands 
corresponding to the expected sizes of the recombinant PRT1 protein 
fragments were observed (Figure 7, lanes 2 and 3), 

4, Preparation of polyclonal m onprspecific antibodies 

1 0 Polyclonal antisera raised against the six synthetic peptides 

were affinity purified. The peptide (Peptide 7 or Peptide 9) was cova'entiy 
linked to an amine reactive support. Immunoglobulins which cross-reacted 
to the peptide were specifically retained by the column, and subsequently 
eluted. In this way, two polyclonal mono-specific antibodies were 

1 5 produced, anti-Peptide 7 and anti-Peptide 9. 

5. Cross-reactivity of polyclonal, mono-specific antibodies with 
recombinant PRT1 polypeptides 

20 Expressed proteins from transformation of E cols DE3(BL21) 

with recombinant expression vector to the pro-domain (F1a1j) or to the 
catalytic domain (Cat2f1) were separated by SDS-PAGE and 
electrophoretically transferred to nitrocellulose membrane. The anti- 
Peptide 7 mono-specific antibody was shown to cross-react with the 

25 recombinant Cat2f1 polypeptide, but not to F1a1j or to the protein 

produced by the control piasmid pET32a. Likewise, the anti-Peptide 9 
antibody specifically cross-reacted with the F1a1j polypeptide. These 
results confirm the specificity of the mono-specific antisera to the two 
distinct domains of the PRT1 protein. 



30 
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6 . Ident ification of PRT1 protei n in P.^jkm3P.A'.cari mi organism s 

P. carinii sp. f. carinii organisms were extracted and enriched 
from infected rat lungs. Organisms were disrupted by heating to 95°C in 
denaturing solution and the proteins separated by SDS-PAGE, followed by 

5 transfer to nitocelluiose filters. The immolhilised proteins were cross- 
reacted with the anti-Peptide 7 and the anti-Peptide 9 antibody. Bound 
antibody was detected using an anti-rabbit secondary antibody, conjugated 
to aikaiine phosphatase. A single, major band, at 40 kDa, was seen with 
each of the mono-specific antibodies. In addition, another major band at 

1 0 38 kDa was seen with anti-Peptide 7 antibody and minor bands at 98 kDa 
and 18 kDa. With the anti-Peptide 9 antibody, minor bands at 200kDa, 
98kDa and 43 kDa were observed. The predicted size of the full length 
PRT1 protein ranges from 87 to 102 kDa. The proteins detected with the 
mono-specific antibodies are assumed to be the products of autocataiysts 

15 at a number of dibasic residues found in the PRT1 sequence. 

7. Sub-cellular localisation of the PRT1 protein in Rcarmn^t 
carinii or ganism s 

Sections of P. carinii sp. f. carinii infected rat lungs, formalin 
20 fixed and embedded in paraffin, were prepared and incubated with anti- 
Peptide 7 antibody. Bound antibody was detected using a swine anti- 
rabbit immunoglobulin secondary antibody, conjugated to horse radish 
peroxidase, and the organisms viewed by light microscopy. The specific 
distribution of the antibody on the P.carinii sp. f. carinii organisms was 
25 characteristic of surface localisation of the PRT1 protein in the organisms. 

Example 2 

Expression of a portion of the human-derived P. carinii {P. carinii sp, 
30 f, hominis) PRT1 gene 
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1 . C onstruction of recombina nt vector c ontaining a portion of the 
P.cannii sp. f. hominis PRT1 gene 

The E.coti expression vector pET32a (Novagen, Madison, 
5 Wl) was used. This vector contains an inducible T7iac promotor, a 8-His 
tag, a multiple cloning site and recombinant protein is expressed as fusion 
protein with the Trx-tag thioredoxin protein (109 amino acids). Thioredoxin 
fusion proteins are generally more soEubie and remain in the E.coti 
cytoplasmic fraction. 
1 0 A 367bp portion of the cloned P. carinii sp. f. hominis 

PRT1(H13) sequence was amplified using PGR with the primers 
PcprotH34/Ri and PcprotH35/RI, corresponding to position 111 to position 
478 on the PRT1 (H13) sequence, in the catalytic domain of the gene (see 
Table 1). The primers included an EcoRS site at the 5' end to facilitate 
1 5 cloning . The resulting fragment (H 1 a1 a) was initially cloned into the EcoRI 
site of the plasmid vector pUC, and then subcloned into the EcoRS site of 
the expression vector pET32a. 

2. Transformation of E. cotfwith recombinant ptasmids 

20 E. coli DH5a competent ceils were transfonned with the 

recombinant plasmid. The cells were transformed with the recombinant 
pUC plasmid, and also the recombinant pET32a plasmid. The 
recombinant expression vector pET32a construct was also transferred into 
E. coli DE3 (BL21) cells, for expression of the recombinant peptide. 

25 

3. Expressio n of recombin ant P.carinii sp. f. hominis PRT 1 peptide 

The recombinant pET32a construct (H1a1a), transformed into 
E. co/7 DE3(BL21) was induced with iPTG, and the bacteria were grown for 
3 to 4 hours. The cells were collected by centrsfugation and disrupted by 
30 sonication. The bacterial proteins were separated by SDS-PAGE and 
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electrophoreticaliy transferred to nitrocellulose fitter. The immobilised 
proteins were cross-reacted with anti-thioredoxin antibody (Sigma), and the 
bound antibody was visualised with a swine anti-rabbit immunoglobulins 
secondary antibody, conjugated to alkaline phosphatase. A band of the 
5 expected size (24kDa) was seen in the vector pET32a control, (lane 1 ) 
corresponding to the thioredoxin fusion protein and the His-tag. A band 
corresponding to the expected size of the recombinant P.caiinii sp. f. 
hominis PRT1 protein fragment was observed {Figure 7, iane 4). 

10 4 - Identification of PRT1 p rotein In P.carmii sp. f. hominis organisms 
P.caiinii sp. f. hominis organisms were extracted from 
bronchoaiveolar lavage fluid from a patient with P. carinii pneumonia. The 
organisms were disrupted by heating to 95°C in denaturing solution and 
the proteins separated by SDS-PAGE, followed by transfer to nitrocellulose 

1 5 filters. The immobilised proteins were cross-reacted with the anti-Peptide 7 
and the anti-Peptide 9 antibody. Bound antibody was detected using an 
anti-rabbit secondary antibody, conjugated to alkaline phosphatase. Two 
major bands, at 56 kDa and 49 kDa was seen with each of the mono- 
specific antibodies. In addition, minor bands at 116kDa, 95kDa, 86 kDa 

20 and 39 kDa were seen with the anti-Peptide 7 antibody, and at 200 kDa, 
116kDa, 95kDa, 86 kDa and 29 kDa with the anti-Peptide 9 antibody. The 
proteins detected with the mono-specific antibodies are assumed to be the 
products of autocatalysis at a number of dibasic residues found in the 
P.carinii sp. f. hominis PRT1 sequence. 
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Figure Legends 
Figure 2 

Nucleotide sequence alignments of part of the catalytic 
domain of PRT1. 1-3 page, 11-3-73} andd 1-3prp5efrom P. carinii f.sp. 
5 carinii m ; ratvSprtl and ratvl 6pri1 from P. carinii I sp. rattus; mouseei prt1 , 
mouse7prt1 and mouse13prt1 from P. carinii t sp. muris; humanprtl from 
P. carinii f. sp. 

Figure 3 

10 Amino acid sequence alignments of part of the catalytic 

domain of PRT1, translated from the nucleotide sequences (Figure 2). 
Pagaprtl, 73jpart1 and prp5eprt1 from P. carinii f . sp. P. carinif : ; ratv5prt1 
and ratvl 6prt1 from P. carinii f . sp. rattus; mouse1prt1, mouse7prt1 and 
mouse 13part1 from P. carinii f. sp. muris; humanprtl from P. carinii I sp. 

15 hominis. U marks conserved amino acids; numbering according to full 
amino acid sequence of cDNA clone 73j <8> ; an asterisk marks positions of 
charge conservation in subtilases (see text). 

Figure 4 

20 Alignment of the P. carinii sp. f. carinii PRT1 deduced amino 

acid sequences from the genomic clone Paga, the cDNA clone 73j and the 
three overlapping PGR products amplified from a cDNA library 
corresponding to the 5' region (PrpSe), the central region (M14), and the 3' 
region (Prp2g). The deduced amino acid sequences of PGR products 

25 amplified from five different regions of the PRT1 gene family were also 
aligned; the catalytic domain: Prpla, Prp3a, Prp7a; the boundary of the 
catalytic domain and the P-domain: Prp2c, Prp3c, Prp4c; the P-domain: 
Prptaf2, Prpf4, Prp5f; the proline-rich region: Pcr-19, Pcr-14, Pcr-5, Pcr-3, 
Pcr-1 , L.am-1 ; the C-terminal region: Prpg4. Prp§3, Prp5g. Gaps were 

30 introduced to maximize homology; identical amino acids are boxed. 
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Figure 6 

Schematic representation of the P. carinii sp. f, carinii PRT1 , 
Patterned boxes represent different domains; small dots represent 

5 hydrophobic regions (HR), diagonal lines indicate the catalytic domain 

(CAT), woven pattern indicates the P-domain (P), vertical lines indicate the 
proline-rich region, squares indicate the serine-threonine rich region (STR). 
Boxes that are defined by a shaded line (PR and STR) indicate length and 
sequence variation in these regions. Diamonds indicate potential 

10 giycosylation sites; (t) catalytic active site residues D214, H252. S423; (|) 
conserved cysteine residues. Residues were numbered with reference to 
the PRT1(73j) sequence. 

Figure 7 

1 5 Recombinant PRT1 polypeptides, expressed in E. coli as 

thioredoxin fusion proteins, separated by SDS-PAGE and cross-reacted 
with an anti-thioredoxin antibody. E. coH DE3{BL21) transformed with: 
lane 1: control plasmid pET32a; lane 2: F1a1a (portion of pro-domain of 
P.carinii sp. f. carinii PRT1 gene); lane 3: G1b1c (portion of P-domain of 

20 P.carinii sp. f. carinii PRT1 gene); lane 4: H1a1a (portion of catalytic 
domain of P.carinii sp. f. hominis PRT1 gene). 



25 
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CLAIMS 

1 . An isolated DNA comprising part or aii of a PRT1 gene of a 

non-rat infecting species of Pneumocystis carina. 
5 2. The DNA according to claim 1 , comprising part or aS! of a 

PRT1 gene of a human-infecting species of Pneumocystis carinii. 

3. The DNA according to claim 1 or claim 2, wherein the PRT1 
gene is in the form of cDNA. 

4. An isolated DMA comprising a sequence shown in figure 1 , or 
10 a non-rat sequence shown in figure 2, or a sequence which hybridises to 

either of these under stringent conditions. 

5. The DMA according to claim 1 or claim 4, wherein the PRT1 
gene has been mutated by point mutation, deletion, insertion, or other 
means. 

15 6. A recombinant vector containing the DNA according to any 

one of claims 1 to 5. 

7. A recombinant polypeptide which is part or ail of a PRT1 
gene product, expressed by a vector according to claim 6. 

8. Synthetic peptides corresponding to antigenic portions of a 
20 PRT1 gene product. 

9. A synthetic peptide chosen from: 



TWRDVQAL I VBTAVP 


(SEQ 


ID 


NO: 


16) 


I TSPSGVTSVLAHRR 


(SEQ 


ID 


SO: 


17} 


ESEGVPPPSYPFLSR 


(SEQ 


ID 


NO: 


18) 


AS TPLAAG V I ALLLS 


(SEQ 


ID 


NO: 


19) 


FRGSS I VGNWTIDVE 


{SEQ 


ID 


NO: 


20) 


DNQHIFSIEKGVLED 


(SEQ 


ID 


NO: 


21) 



10. A method of producing antibodies specifically 
immunoreactive with a Pneumocystis carinii protease, which method 

30 comprises using a polypeptide according to claim 7 or a synthetic peptide 
according to claim 8 or claim 9 to generate an immune response. 

1 1 . Antibodies produced by the method according to claim 10. 



WO 98/39424 



PCT/GB98/00704 



48 

12. Antibodies according to claim 1 1 , which are monoclonal. 

1 3. A method of screening for anti- Pneumocystis carinii 
compounds, which method comprises providing a source of a recombinant 
polypeptide expressed by part or ail of a PRT1 gene or cDNA. and 

5 contacting the compound with the recombinant polypeptide. 

1 4 . The method according to claim 1 3, wherein the recombinant 
polypeptide is expressed at the surface of a cell. 

1 5. The method according to claim 13 or claim 14, for screening 
for protease inhibitors effective against Pneumocystis carinii. 

10 16. The method according to any one of claims 1 3 to 1 5, using a 

recombinant polypeptide corresponding to part or ail of the catalytic 
domain of the protease. 

17. A ceil transfected with a vector according to claim 6 and 

expressing a polypeptide according to claim 7. 
15 18. An engineered cell line expressing a recombinant polypeptide 

from part or all of a PRT1 gene or cDNA, which may be mutated by point 
mutation, deletion, insertion or other means, useful in the method 
according to any one of claims 13 to 16. 

19. The cell line according to claim 18, wherein the PRT1 gene or 
20 cDNA is from a human-infecting Pneumocystis carinii species. 

20. The method according to any one of claims 13 to 16, wherein 
the PRT1 gene or cDNA has been mutated by point mutation, deletion, 
insertion or other means. 

21. A Pneumocystis carinii protease isolated using an antibody 
25 according to claim 1 1 or claim 12. 

22. A PRT1 clone for part or all of the human-infecting 
Pneumocystis carinii PRT1 gene. 
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Figure 1 

Human-derived Pneumocystis carinii subtilisin-iike serine protease 
(PR7f)(H13) 

1 TGAAGTAGCT GCCGTTCGAA ATACTGTTTG TGGAATCGGT GTTGCATATG 
51 AATCCAAAGT TTCTGGTATT TTATTCTTTT TGACTGAATC TAATATAATA 
101 TCATTAAGGT TTGCGAATAT TATCCGGGCC TATAACAGAT C1TGATGAAG 
151 CAGAATCGCT TAATTATGAT TTCCATAAAA ATCATATTTA TTCCT6TAGT 
201 TGGGGACCTG ACGATGATGG AAAAACTGTT GATGGGCCTT CTTCTCTTGT 
251 TCTTAGAGCA CTTATTAATG GAGTAAATAA TGGAAGGAAT GGGTTGGGTT 
301 CTATCTATGT TTTTGCATCA GGAAATGGTG GAATATATGA AGATAACTGT 
351 AATTTCGATG GATATGCAAA TAGTGTGTTT ACCATTACTA TTGGTGGCAT 
401 AGATAAACAT GGAAAGCGTC TTAAATATTC TGAAGCGTGT TCTTCTCAGC 
451 TAGCTGTTAC ATATGCAGGT GGAAGTGCGG ATATATTTGT AACTTTAATT 
501 CTATTTTTTT TTATATAAAT TTATAATAAT TAGTATACTA CTGATGTTGG 
551 TACAAATAAA TGTACGAGTA GACATGGTGG TACC 
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Name : 


Paga 


Len : 


3130 


Check •. 


3848 


Weight : 


1 


oc 


Name : 


73j 


Len : 


3 ISO 


Check : 


2744 


Weight : 




00 


Name : 


PrpSe 


Len : 


3150 


Check : 


2286 


Weight : 


1 


00 


Name : 


Ml 4 


Len: 


3150 


Check: 


9011 


Weight : 


1 


Q0 


Name : 


Prp2g 


Len: 


3 ISO 


Cheek: 


9244 


Weight: 


1 


00 



1 SO 
Paga ATGATTTTTA AGATACTCAT TACnTTTTC TTATACTGGA TCTATTTAGT 

73 j atgattttca agatactcct TAcrrnrrc ttatactgga TcrArrrAGT 

PrpSe ATGATTTTTA AGATACTCAT TACTTTTTTC TTATACTGGA TCTATTTAGT 
M14 



SI 100 
Paga TAGAGTAAGA TGTGAAATGA AGCCAGTAGA CTTTGAAAAT AATGATTATT 
73 j TAGAGTAAGA TGTGAAATGG TGCCAGTAGA CTTTGAGAAT AATGATTATT 
PrpSe TAGAGTAAGA TGTGAAATGG TGCCAATAGA CTTTGAGAAT AATGATTATT 
M14 



101 150 
Paga A. . .TCATTT TCATTTCTCA GAAGATGTTG ATATrGAGGA GTTrTCGCGG 
73j ATTATTATTT TCATCTCTCA GAAGATGTTG ATATTGAGGA GTTTTCTCGG 
PrpSe A. . .TCATTT TCATTTCTCA GGAGATGTTG ATATTGAGGA TTTTTCGAGG 

M14 

Prp2g 

151 200 
Paga GCGGTACGAT TGAAATATCA TATGAAAGTA GAATATCTGG ATAACCAGCA 
73 j GCGGTAGGAT TGAAATATCA TATGAAAGTA GATCATCTGG ATAACCACCA 
PrpSe GCGTTAGGAT TTAAACATTA TATGAAACTA GAACATCTGG ATAACCAGCA 
M14 



201 250 
Paga TATATTTTTC ATAGAAAaGG GTGTTTTAGA AGACGAAATT AAAGAAAAAA 
73 j TATATTTTTT ATAGAAAAGG GTGTTTTAGA AGACGAAATT AAAGAAAAAA 
PrpSe TATATTTTCT ATAGAAAAGG GTGTTTTAGA AGACGAAATT AAAGAAAAAA 
M14 

p*p2g 

251 300 
Paga TTGAGAATTA TTTTGGTTTA GAAAaAGGAA GAAaTGCAaT AGATGGGTTT 
73 j TTGAGAATTA TTTCAGTTTA GAAAAAGGAA GAAATGCAAT AGATGGGTTT 
Prp5e TTGAGAATTA TTTTGGTTTA GAAAAAGGAA GAAATGCAAT AAATGGGTTT 

H14 

?rp2g 

301 350 
Paga AATAGTGACA AACrTTTTTA TTATGAGAAA CAAAAGTTGG TCAAGCGAGT 
73 j AATAGTGACA AGCITTTTTA TTATGAGAAA CAAAAGTTGG TCAAGCGAGT 
PrpSe AATAGTGACA AGCTTTTTTA TTATGAGAAA CAAAAGTTGG TCAAGCGAGA 

M14 

Prp2g 

351 400 
Paga AAACAGGGGT' GTGATAAGAG ACGATATATA TTTTGATAAT GAAGGTCTTT 
73 j AAACAGGGGT GCGATAAGAG ACGATATATA TTTTGATAAC C!AAGATCTTT 
PrpSe AAACAGGGGT GTGATAAGAG ACGATATATA TTTTGATAAT AAAGGTCTTT 
M14 
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Prp2g , 

401 4S0 
Paga ATAATAGAAG AA.-..TTGTT AAGAATGTTG TAAAAGATTC GACGGGAGAT 
73 j ATAATGATGA AGAAATfGTC AATAATGTTC TAAAAGATCC GACGGTAGAT 
PrpSe ATAATAGAAG AG . . . TTGTT AAGAATGTTG TAAAAGATCC GACGGTAGAT 

M14 ...... 

Prp2g 

451 500 

Paga CAGGCG GT AGATTTAAGA GAGAAGATAA AGAAAATTAA 

73 j CAGGCGAAAA AATCGACGGA AGATTTAAAA GAGAGGTTAA AGGAAATTAA 

?rp5e CTGCCG GT AAATCTAACG CAGAAGTTAA AGAAAATTAA 

HI4 

Prp2g 

$01 550 
Paga AGAAGAATTA AATATAAGTG ACCCTTA1TT TGATAAACAA TGGTATTTGG 
73 j AAAAGAATTA GGTATAACTG ACCCTTGTTT TGATAAACAA TGGTATTTG . 
PrpSe AGAAGAATTA AATATAAGCA ACCCTTATTT TGATAAACAA TGGTATTTG . 

M14 .......... 

Prp2g 

551 SOO 
Paga TATAGTTTAT TCTTTTTTTC AT CAAAATTT GATTTTTTAA TTAGTTCAAT 

73 5 TTTAAT 

PrpSe TTCAAT 

M14 

Prp2g 

601 650 
Paga AAGGATAAAG CTGGTGTAGA TATAAATGTT ACAGGTGTAT GGTTACAAGG 
73 j ACGGAAAAAC CTGGTGTAGA TATAAATGTT ACAGGTGTAT GGTTACAAG. 
PrpSe AAGGATAAAG CTGGTGTAGA TATAAATGTT ACAGGTGTAT GGTTACAAG. 

Ml 4 

P*p2g 

651 700 
Paga TTTGATATTT GTGTTGTTAC TCGCCTTTTA ATGGATTTTA GGGATAAAGG 

73 j GGATAACGG 

PrpSe GGATAAAGG 

M14 

Prp2g 



Paga 
73j 

PrpSe 
Ml 4 

Prp2g 



70.1 7S0 
GAAAAAATGT AACAGTTGCT ATTGTAGATG ATGGCTTAGA TTATACTAAC 
GAAAAGGTGT AACAGTTGCC ATTGCAGATA ATGGCTTAGA TTATACTAAC 
GAAAAAATGT AACAGTTGCT ATTGTAGATG ATGGCTTAGA TTATACTAAC 



751 800 
AAGGATTTGG CTCCAAATTA TGTTTGAAAA ACTATTATGG AAATCACTAT 

AAGGATTTGG CTCCAAATTA T 

AAGGATTTGG CTCCAAATTA T 



Paga 
73j 
PrpSe 

Ml 4 

Prp2g 

801 850 
Paga TTTAACTTTT TTCAGAATGC TAACGCTTCA TATAATTTTG CTTCTAAAAC 

73 i " . . AATTC ACAGGGTTCA TATGATTTTG TTTCTAAAAC 

PrpSe . AATGC TAACGCTTCA TATAATTTTG CTTCTAAAAC 

Ml 4 

p ^P2g 
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S51 900 
Pag a TGGCGACCCA AAACCTG . . . AACCTTCTGA CACGCATGGT ACTAAATGTG 
73 j TGACGACCCA AACCCTAAGA GCTCTTCTGA CACGCATGGT ACTAGATGTG 
PrpSe TGGCGACCCA AAACCTG. . . GACCTTCGGA CACGCATGGT ACrAAATGTG 

Ml 4 

Prp2g 

901 350 
Paga CAGGAGAAGT GGCAGGCGCC AGGAATGATT TTTGTQGGCT TGGTGTTGCA 
73 j CAGGAGAAGT GGCAGCCGCC AGGAATGATT TTTGTGGGCT TGGTGTTGCA 
PrpSe CAGGAGAAGT GGCAGGCGCC AGGAATGATT TTTGTGGGCT TGGTGTCGCA 

Ml 4 

Prp2g 

951 1000 
Paga TATGAATCTA ATATTTCAGG TATTTTTCTT TAATTGGTAC CTATCTAATA 

73 j TATGAATCTA ATATTTCAG. . 

PxpSe TATGAATCTA ATATTTCAG 

M14 

Prp2g .......... .......... 

1001 1050 
Paga TTGTTAAGGA TTACGATTTA TGCCTTCTGC TCGTTCGTCT TGGCTTGAAG 
73 j ....... .GA rrACGATTTT TGCCTTCTGG TCTCTCGTAT CATCTTGAGT 

PrpSe .... GA TTACGATTTA TGCCTTCTGC TCGTTCGTCT TGGCTTGAAG 

M14 

Prp2g .......... .......... 

1051 1100 
Paga GAGAAGCTCT TATTTACAAA TATGATGTTA ATCATATTTA TTCTTGTAGC 
733 CACTAGCTCT TAGTTATAAA CCGAATGTTA ATTATATTTA TTCTTGTAGC 
PrpSe GAGAAGCTCT TATTTACAAA TACGATGTTA ATCATATTTA TTCTTGTAGC 
M14 

Pxp2g 

1101 1150 
Paga TGGGGACCTG CCGATACTGG GAATTTAACT CAAGATATTT TTTATACTAC 
73 j TGGGGACCTC CTGGTGATGG ATATGCAGCT ATCCCAATGT ATCCTACTAC 
PrpSe TGGGGACCCG CCGATACTGG GAATTTAACT CAAGATATTT TTTATACTAC 

M14 . 

Prp2g 



1151 1200 
Paga TTATTCTGCA ATTATTAAAG GGATAAATCA AGGAAGGAAT GGTCTTGGTT 
73 j TTATTCTGCA ATTATTAAAG GGATAAAAGA AGGAAGGAAC GGTCTTGGCT 
PrpSe TTATTCTGCA ATTATTGAAG GGATAAATCA AGGAAGGAAT GGTCTTGGTT 

M14 , 

P*P2g 



1201 1250 
Paga CTATATACGT TTTCGGGTCA GGAAATGGTG GATATTTTGA TAATTGTAAT 
73 j CTATATATGT TTTTGGAACC GGAAATGGTG GATCATTGGA TGGTTGTAAT 
PrpSe CTATATACGT TTTCGGGTCA GGAAATGGTG GATATTTTGA TAATTGTAAT 
M14 

p^p2g • 

1251 1300 
Paga TACGATGGAT ATGCAAATAG CCCATATACT ATTACTATCG CTGCTATAGA 
73 j TACGATGGAT ATGCAAATAG TCCATATACT ATTACTATCG CTGCTATAGA 
PrpSe TACGATGGAT ATGCAAATAG CCCATATACT ATTACTATCG CTGCTATAGA 

M14 

P*P2g 
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1301 1350 
Paga TGCAGAAGAA AAAAGATTCA TATFTTCACA GCCATGTCCT TGTATTTTAG 
73 j TTCAGAAGAT AAAAATTTTT ATTTTTCAC-A GTCATGTCCT TGCATTTTGG 
PrpSe TGCAGAAGAA AAAAGATTCA TATTTTCAGG GCCATGTCCT TGTATTTTAG 

M14 . , 

Pr P2g . .......... .......... 

1351 1400 
Paga CTTCTACGTA TTCTGGCAAG CGTGGTGCAT ATATTGTAAT CTTTTCTTTT 

73j CTTCTACATA TTCTGGCGGA GAAAATGGAT CTATT 

PrpSe CTTCTACGTA TTCTGGCAAG CGTGGTGCAT AT ATT. .... 

M14 .......... .......... .......... . 

Prp2g 

1401 14 50 

Paga TTTTTATAAT AAATTGA7CG TTTTAGTATA CTACGGATGT TGGTACGACA 

73 j TATA CTACGGATCT TGGTAAGGAG 

PrpSe .......... . , TATA CTACGGATGT TGGTACGACA 

M.14 

Prp2g . .......... 

1451 1500 
Paga GAATGCAGCA TTAGACATAC TGGAAGTTCT GCTTCTACAC CTCTTGCTGC 
73} GGATGCACTA CTGAACATAC TGGAGCTTCT GCTTCTACAC CTCTTGCTGC 
PrpSe AAATGCAGCA TTAGACATAC TGGAAGTTCT GCTTCTACAC CTCTTGCTGC 



Ml 4 

Prp2g 

1501 1550 
Paga GGGTGTTATT GCTCTTCTTC TTTCAGCATG GTAAGAATAT CATTAAAATT 

73 j GGGTATTATT GCTCTTGTTC TTTCAGCGAA 

PrpSe GGGTGTTATT GCTCTTCITC TTTCAGCATG 

M14 

Prp2g 

1551 1600 
Paga ATTTGACTAA AAAATTAGTC CTAATCTTAC ATGGCGTGAT ATTCAAGCTT 

73 j TC CTAATCTTAC ATGGCATGAT GTTCAAGCGT 

PrpSe TC CTAATCTTAC ATGGCGTGAT ATTCAAGCCT 

M14 

?^p2g 

3.601 1650 
Paga TGATTGTGGA GACAGCTGTT CCATTTAATC CGAGTCATCC TGATTGGGAT 
73 j TGATTGTGGA AACAGCTGTT CCATTTAATT TGGAATATCC TGGATGGGAT 
PrpSe TGATTGTGGA GACAGCTGTT CCATTTAATC CGAGTCACCC TGATTGGGAT 

M14 

P*p2g 

1651 1700 
Paga GAT CTTCCTT CTGGACGTCG TTATAATAAT TTTTTCGGTT ATGGAAAACT 
73 j AAACTTCCTT CTGAACGTCA TTATAGTAAT AATTTTGGCT TTGGAAAGCT 
PrpSe GATCTTCCTT CTGGACGTCG TTATAATAAT TTTTTCGGTT ATGGAAAACT 



M14 

P*-p2g • 

1701 1750 
Paga AGATGCATAT AGAATGGTCG A A A AAG CAAG AACATTTAAA ACCTTAAATC 
73 j AGATGCGTAT AGAATGGTCG AAAGAGCAAA AAC&.TTTAAA ACATTAAATG 
PrpSe AGATGCATAT AGAATGGTCG AAAAAGCAAG AACATTTAAA ACCTTAAATC 

Ml 4 CATAT AGAATGGTCG AAAGAGCAAA AACATTTAAA ACATTAAATG 

Prp2g 

1751 1800 
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Paga CTCAGACAAT GTTTTCAACT CAACTAATAC CACTTAATAA GAAATTTTCT 
73 j CTCAGACAAT GTTTTCAACT CAACTAATAC CACTTAATAA GACATTTTCT 

PrpSe CTCAGACAAT GTTTTCAACT CAACTAATAC CACTTAATAA GAAATTTTCT 
Ml i CTCAGACAAT GTTTTCAACT CAACTAATAC AAATTAATAT GAAATTTCCT 

Prp2g 

1801 1.8S0 
Paga GAAAACGGTG GGCATATCAC AAGCAGTTTT TATATTCATC GTGGATATCC 
73) GAAAACGGTG GGCATATCAC AAGCACTTTT TATATTGATA GTGGATCTCC 
PrpSe GAGAACGGTG GGCATATCAC AAGCAGTTTT TATATTCATC GCGGATATCC 
Ml 4 GATCCCAGTA GACCTATCAC GAGCAGTTTT TATATTCATA GTGGATATCC 
Prp2g '. .......... 

1851 X900 
Paga TAAGCATTAT AAATTTAAAA GTTTAGAGTA TCTTGGTGTT TCATTTCATT 
73 j TACGCATTAT AACTTTAAAA ATTTGGAATA TGTTGGTGTT TCATTTCATT 

PrpSe TAAGCATTA , 

M14 TACGCATTAT AACTTTAAAA ATTTGCAATG TCTTGGTGTT TCATTTCATT 
Vrp2q 

1901 1950 
Paga ATCAGCACCA AAGAAGAGGT CATCTAGAGT TTAATA1T AC CAGTCCTTCT 
73 j ATAAGCACCA ATATAAAGGT CATCTGGAGT TTAATATTAC CAGTCCTTCf 

PrpSe . . . . . » ...... 

M14 ATCAGCACCA AAAAAGAGGT CGTCTGGAGT TTAGTATTAC AAGCCCTGCT 



1951 2000 
Paga GGAGTTACTT CAGTATTAGC ACATAGACGT AATCGTGATA AACATGGTGG 
73 j GGAGTTACTT CAGTATTAGC ACATAGACGT ATTAATGATT ATAATAGTGG 

PrpSe 

M14 AATGTTACTT CAAAATTAGC ACGTGTACGT GTTCGTGATG AAGAAAGTGG 

?rp2g 

2001 2Q50 
Paga CAGTATTCTT TGGACTTTTA TGACTGTAAA GCATTGGTAT TTTGTTTCAT 

73 j CAcrrrrcAT tggtttttta cgactgtaaa gcattg 

PrpSe 

M14 CACTTTTTCr TGGATTTTTA CGACTGTAAA GCATTG 

P^P2g 

2051 2100 
Paga T1TGTAAAAT AATAACTAAT GATTTTAGGG GAGAATCCAT TGTAGGTAAT 

73 j GG GAGAAACCAT TGTAGGTAAC 

PrpSe 

M14 .- GG GGGAAAAGAT TGTAGGTAAT 

P*p2g 

2101 2150 
Paga TGGACTATCG ATGTTGAAGA TAAAAAGGAT GAGAATCTAG ATGGTGGAGT 
73 j TGGACTATCG ATGTTGAAGA TGAAAAGGTT TCGAATCTAG ATGGTGAAAT 

PrpSe ; 

MM TGGACTATCG ATGTTGAAGA TGAAAAAGAT CCGAATCTAG ATGGTGAAGT 



Prp2g 

2151 2200 

Paga TTTTGATTGG CAACTTCATT TTTTCGGGGA GTCTTGTGAA TCA. . .GAAG 

73 j TTTTGATTGG CAACTTCATT TTTTCGGAGA GTCTATTCAT TCAAGTAAAG 

PrpSe .... 

Ml 4 TTTTAATTGG CAACTTCATT TTTTCGGAGA GTCrATTGAT TCAACAAAAG 

Prp2g 

2201 22S0 
Paga GCGTACCGCC TCCTTCATAT CCTTTTCTAT CTAGATATCC AACTACTACG 
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Figure 5 

73 j cagaacttca TCCTCCATAT ccrrrfAAGc CTCAA 

PrpSe , 

Ml 4 CACA...GCC TCCTCCATAT CCTTTTGTGC ATAAACAACC AACTACTATG 
Pr P2g 

2251 2300 
Paga CCTCCACCAG ATCCAGATGC TACACCTTCT CCAGATCTGG ATGCrAACCT 

73 j . 

PrpSe 

Ml 4 CCTCCGCCAG AACCAACTAC TACGCTTCCA TCAGATCCAG ATGCTACATC 
Prp2g 

2301 2350 

Paga TCAGCCAGAT TCAAATGCTG ACTCT C 

73 j ■ •.. 

PrpSe 

H14 TCTACCAGAT TTAAATGTTG CACCTTCGCC AGATTTAAAT GCTAACCCTC 
Prp2g 

2351 2400 
Paga AACCTCAACC AGATGTTAAG CCTCTGCCTT CATFAGATA? TGAGCCTCAA 

73j 

PrpSe , 

M14 AACCTCAACC AGATCCTGGG TCTCCGCCCT CATCAGATCC TGAGTCTCCG 
PJT?2g 

2401 24 SO 

Paga CCTCCATCAG AACCAGATTC TAACCCTCCA TCAGATCTi'A GCTCTCAGCA 
73j CCTCCTTCAA AACCTGCGCC TCCATCAAAA CCAGATCCTA ACCCTCCATC 

Prp5e .. 

Ml 4 TCTTCATTAG AACCTGCGCC TCCATCAAAA CCAGATCCTA ACCCTCCATC 
Prp2g 

2451 2500 

Paga AGATCC AGATAC TTCGCTTTCA TCAAATGCAA 

73 j AGATCCTAGC TCTCAGCAAG ATTCAGATAC TTCGCTTTCA TCAACTCCAA 

PrpSe 

HI 4 AGATCCTAGC TCTCAGCAAG ATCCAGATAC TTCGCTTTCA TCAAATCCAA 
P*p2g 

2501 25S0 
Paga CTTCTACATC TTCATCAGAA CTACCACCAC TACCACCACC ACCGCCGCCA 

73 j CTTCTACATC TTCATCAAAA 

PrpSe 

M14 CTTCTACATC TTCATCAGAA CCACCACCAC TACCACCACC ACCGCCAC . . 
Prp2g 

25S1 260Q 
Paga CCTGCACCTG CACCACCTGC ACCTGCACCA CCTCCACCAC CGCCGCCACC 

73 j 

PipSe , 

HI 4 .CTGCACCTG CACCGCCTCC ACCACCGCCG CCACCACCAT CTCGGCCGGA 
Prp2g .......... 

2601 2S50 
Paga ACCACCTCGG CCGGAACCAC AACCACAACC AGAGACACAA CCAGAGACAC 

73j 

PxpS* 

Ml 4 ACCAGAACCA GAACCGCGAC CAGAACCAAA ACCAAAACCA GAACCAGAAC 
Prp2g .......... , 

2651 2700 
Paga AACCAGAGAC ACAACCAGAG ACACAACCAG AGACACAACC ACCACAACCA 

73 j 
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Figure 5 

PrpSe . . .... 

M14 CAGAACCAGA ACCAGAACCA GAACTAGAAC TAGAACTAGA ACTAGAACTA 
P^p2g • • • • ■ ■ • • • 

2701 2750 

Paga CCACAACCAC CACAATCAGA GACACAACCA GAACCAGAAC CAGAACCAGA 

73j .......... 

PrpSe .......... 

M14 GAACCAGAAC CAGAACCAGA ACCAGAACCA GAACCAGAAC CAGAACCAGA 

Prp2g 

2751 2300 
Paga ACCAGAACCA GAACCAGAGC CAGAGCCAGA GCCACAACCA GAACCAGAAC 

*"j ••• .......... 

PrpSe 

M14 GCCACAACCA GAGCCACAAC CAGAGCCAGA ACCACAACCA GAGCCACAAC 

?r P 2g • 

2801 28S0 
Paga CAGAGACACA ACCAGAGCCA CAACCACCAC AACCAGACCC ACAACCACCA 

73 j C TGTCACCACC ACCTACACCT 

PrpSe . 

Ml 4 CAGAGCCACA ACCACAACCA GAGCCACAAC CAGAGCCAGA ACCACAACCA 
Prp2g 

2351 2900 
Paga GAACCAGAGC CACAACCAGA GCCACCTGCA TCTCCACCAA AACTACAACC 
73 j CAACCAAAGC CAGAACCAGA AC'CGGAACAG AAACCGACAT CAATAGCTTC 

PrpSe , , . 

Ml 4 CCGCTGCCAC AACCACCGCT GCCACCTGCA CCTCCACCAA AACCACAACC 

**p*s 

2901 2950 
Paga GGAACAAAAA CCAACATCAA TAACTTCATC TACATCTACG ACTTCATCGA 
73j ATCTACAACA TCAACTAATT TAATTCCACC AGCTCCCACA TCTTCATCAA 

PrpSe 

M14 GGAACAAAAA CCAACATCAA TAACTTCATC TACATCTACG ACTTCATCGA 
Prp2g ATCAA 

2951 3000 

Paga GCAAAACTAA AATATCAACC ACTCGAAAAG CTTCATGTAC TAT 

73 j GCAAAACTAA AACATCAACC ACTCGAAAAG CTTCATCTAC TA 

PrpSe 

Ml 4 GCAAAACTAA AATATCAACC ACT 

Prp2g GCAAAACTAA AATATCAACC ACTCGAAAAG CTTCATCTAC TAAAACTTCA 

3001 3050 

Paga AA CAGTCTTTAT AGGGCCATCT CCTACTGAGG GTGTTTCTAC 

73 j . CAA AAACCTCTAC ACGGCCGTCT CCTACTGAGG GTACTTTTAC 

Prp5e 

M14 

Prp2g TCTACTACAA AAACTTCTGC ACGGCCGTCT CCTACTGAGG GTACITTTAC 

3051 3100 
Paga TGGATCAAGT GCTTCTCATC TTTCATTCIT CGAAAAAAGG CATTTGTTAC 
73 j TGGATCAGGC TGTTCTCATC TTTCATTCTT CGAAAAAAGG CATTTGTTAC 

PrpSe » 

Ml 4 , 

Prp2g TGGATCAAGT GCTTCTCGTC TTTCATTCTT CGAAAAAAGG CATTTGTTAC 

3101 3150 
Paga TTCAAATGAT ATTATTGTTA TTCTTTTTCT TA TTTT TGGO TTACTCTTTT 
73 j TTCAGATGAT ATTATTGTTA TTCTTTTTCT TATTTTTGGG TTACTCTTTT 

PrpSe 
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Figure 6 
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